Avoid inconsistent quantile computation between architectures #80
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
For particular sketches and quantile values,
GetValueAtQuantile
can return very different results on different architectures, resulting from slightly different rounding on internal float values used in the computation. This PR exposes the issues in the tests and addresses the problem.The problem was initially highlighted with a sketch containing 21 positive values, 14 of which being 0, and computing the 0.95 quantile of the sketch. In this case, we get:
The number of zeroes in the sketch is then subtracted from this rank to find the key at rank 5 (19-14) in the positive-values store of the sketch.
0.95 does not have an exact representation as a float64 (
0x1.e666666666666p-01
). But when it is multiplied by 20, the rounding on the result lead to the value 19, which is represented in an exact way as float64 (0x1.3p+04
). This is the case on both ARM64 and AMD64 systems on which I tested this. InGetValueAtQuantile
, I could validate thatrank
for this particular sketch was computed consistently to this value 19 (0x1.3p+04
).I was surprised to see that when
GetValueAtQuantile
computesrank - s.zeroCount
just after, the value was different on ARM64 and AMD64:0x1.4p+02
).0x1.3ffffffffffffp+02
.This is enough to get a different bin for this rank, resulting in a very different 0.95 quantile for the sketch I was testing with.
Disassembling the code generated for
GetValueAtQuantile
showed that therank
variable was actually not reused by the compiler to computerank - s.zeroCount
. Instead, the fused multiply-add fnmsub instruction was used to computequantile * (count - 1) - s.zeroCount
directly. This instruction does not perform an intermediate rounding after the multiplication, which explains how the end result can be slightly different.In
GetValueAtQuantile
, we want to make sure that the rank computed can be safely used as a reference point when later computing a rank to lookup in the negative-value or positive-value store. This can be achieved by an explicit floating point conversion:As per the go specification:
This problem is not specific to the sketch I was testing with of course. It can happen whenever a sketch contains zero values, the computed quantile doesn't have an exact float64 representation (requiring rounding on the rank computation), and the rank to look for in the positive-value store falls exactly at the boundary between 2 bins.
The PR adds a test which exposes the issue on ARM64 when the explicit float64 conversion added by this PR is removed. It uses sketches built from values following a linear distribution, but with zeroes added once every 2 values. For the right test size (21) and quantile (0.95), the lower and upper quantiles computed on the
Dataset
are equal (becausemath.Floor(rank)
andmath.Ceil(rank)
are equal), but the effective rank used byGetValueAtQuantile
is rounded down, selecting the bin just below the expected one.